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Abstract 

We examine the following version of a classic combinatorial search problem intro¬ 
duced by Renyi: Given a hnite set X of n elements we want to identify an unknown 
subset Y d X of exactly d elements by testing, by as few as possible subsets A of 
X, whether A contains an element of Y or not. We are primarily concerned with the 
model where the family of test sets is specified in advance (non-adaptive) and each test 
set is of size at most a given k. Our main results are asymptotically sharp bounds on 
the minimum number of tests necessary for fixed d and k and for n tending to infinity. 


1 Introduction 

We consider a central question of combinatorial search theory in which, given a set X, we 
wish to identify a particular subset Y of unknown elements of X. We call the elements of Y 
defective. To this end, we are allowed to construct a family A of queries. Each query in A 
corresponds to a subset A d X and we receive a positive result if and only if A contains at 
least one of the elements of X. Here we are concerned with the so called non-adaptive case, in 
which the queries are chosen in advance, so we cannot modify A based on the answers to some 
of the queries. The typical goal is to find the minimum size of a family A that is required to 
determine any set Y. This question is related to many practical problems, amongst which 
are Wasserman-type blood tests, chemical analysis and the defective coin problem [5l[l6]. A 
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comprehensive overview of the main types of combinatorial search problems can be found in 
a survey by Katona [13] or the monograph of Du and Hwang [S] . 

In order to identify a hxed defective set of size at most (or exactly) d among n elements 
it is well known (see e.g. 0) that the number of queries required, denoted q{n,d), satishes 

D ( , log < q(n,d) < 0(d‘^\ogn). 

\logd ) 

In this paper, we restrict our attention to the case where the query sets may only be of 
size at most k. For this model, the particular case where Y contains a single element was 
posed as a problem by Renyi na and solved by Katona na for k < n/2. Katona determined 
the exact form of a matrix representing an optimal search and used this to hnd upper and 
lower estimates for the minimum number of queries. While the lower bound provided is best 
known, the upper bound was subsequently improved by Wegener im and Luzgin [T3|. In 
2008 Ahlswede [T] proved that the lower bound is asymptotically tight. 

Let W be a set of size n and Y C X be a set of defective elements of size at most d. 
Let q{n, d, k) denote the least number of queries of size at most k necessary to identify Y. 
In 2013, Hosszu, Tapolcai and Wiener HH strengthened Katona’s result while providing a 
proof entirely relying on linear algebraic methods. 


Theorem 1 (Hosszu, Tapolcai and Wiener). For k < n/2, q{n, l,k) is the least number q 
for which there exist positive integers j < q — I and a < (jh) such that 



+ + 1) < kq, 



+ a = n. 


When n is large enough this gives the following corollary. 
Corollary 2 (Hosszu, Tapolcai and Wiener). If n > ( 2 ) + 1, then 

2n — 2 


q{n, l,k) = 


k 


When the defective set is of size at most d, where d > 1, D’yachkov and Rykov [7] proved 
a general lower bound and found conditions for when this lower bound is sharp (see also 
Fiiredi and Ruszinko [10]). 

Theorem 3 (D’yachkov and Rykov). Ifn>k>d>2, then 

< q{n, d, k). 


dn 

T 


Furthermore, if d > 3, k > d + 1 and n = then 


q{n, d, k) 


dn 

T 


dk<^-\ 
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In light of the above results, we focus on the case when the defective set Y has size 
exactly d. This allows for a smaller number of queries to determine Y. Dehne q{n, d, k) 
as the minimum number of queries needed to hud a hxed defective set Y of size exactly d 
among n elements. Our main theorem gives bounds on g(n, d, k) that are asymptotically 
sharp when d is even. 


Theorem 4. Fix an integer d> 2. If n > k > \_d/2\ + 1, then 


([d/2j + l)n 
k 


1 < g(n, d, k). 


Furthermore, if k >2 and n is sufjieiently large, then 


q{n, d, k) < 


' (rd/21+l)(n-l) ' 
k 


+ i\dl‘2~\ + 1 ). 


Note that by querying singleton sets we can identify any defective set of size exactly d 
using n — 1 queries (or a defective set of any size using n queries). So we have the trivial 
upper bound q{n,d,k) < n — 1. In particular, this means that the upper bound above is 
only of interest when A;>rd/2]+l. 

When d = 2, 3 these bounds can be improved. 

Theorem 5. Let n > k be positive integers with n sufjieiently large, then 


(a) 


q{n,2,k) 


2(n-l) 

k 


(b) 


3n — 2 
k + 3 


< q{n, 3, k) < 


3n 

T 


+ 2. 


We use the notation [n] to denote the set {1, 2,..., n}. A family A of subsets of [n] is 
called d-separating {d-separating) if for any two distinct sets Di,D 2 C [n] of size d (at most 
d, respectively) we have a member A E A such that either 


A n Di 7 ^ 0 and AD D 2 = ^, 


or 

A n Di = 0 and A fl £>2 7^ 0- 

It is well known that for a hxed defective set Y of size d a family of queries can determine 
Y if and only if A is a d-separating family. The separating property is monotone in the 
following sense: a d-separating family is f'-separating for any i < d. 

For IA| = q, we can form a qxn matrix M such that the rows of M are the characteristic 
vectors of the members of A. The columns of M can be thought of as characteristic vectors 
of a hypergraph FL on the vertex set [q]. It is easy to see that A is d-separating if and only if 
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l-i is d-union-free, that is, every collection of exactly d distinct members of "H has a unique 
union. For a given family of queries A we will call such a hypergraph "H the dual hypergraph 
of A. Note that in this model it is possible for "H to have the empty set as a hyperedg^. 

Clearly if a family is d-separating, then it is also d-separating. Chen and Hwang [3] 
provide a relationship in the other direction. 

Theorem 6 (Chen and Hwang). If A is 2d-separating family, then there exists a d+1- 
separating family A' obtained by adding at most one new element to the ground set of A. 

This theorem is the primary tool for the lower bound of Theorem 01 Chen and Hwang 
state that their theorem is weak in the sense that it should be possible to construct a i- 
separating family Al for £ > d + 1. However, it follows from our upper bound in Theorem 0] 
that in general I cannot be improved to d + 2. 


2 General bounds on q(n^ d, k) 


We begin by proving the lower bound in Theorem 01 Suppose that .4. is a d-separating family 
on ground set [n] with query size at most k such that 


1^1 < 


([d/2j + l)n 
k 


- 1 . 


Set f'=[d/2]+l. By Theorem [6l we can add at most one new element to the ground set of 
A in order to obtain an ^-separating family A! such that 


1^1 < 


In 

T 


This contradicts the lower bound given by Theorem 01 

To prove the upper bound in Theorem 01 we show an explicit construction of the dual 
hypergraph. Recall that a hypergraph "H is l-uniform if all hyperedges are of size I. Fur¬ 
thermore a hypergraph H is linear if every two hyperedges intersect in at most one vertex. 

A hypergraph is said to be a eyele if it has at least two edges and there exists a cyclic 
ordering of its edges {ci,..., e^} such that there are distinct vertices Vi,... ,vi such that 
Vi = ejClej+i (where e^+i = Ci). This concept of a cycle in a hypergraph is sometimes called 
Berge-cycle, after C. Berge |2]. The length of a cycle is the number of edges it contains and 
the girth of a hypergraph is the length of the shortest cycle it contains. We use the term 
triangle and to refer to hypergraph cycles with three and four hyperedges, respectively. 

We begin with a lemma relating the uniformity and the property of being union-free for 
hypergraphs of girth at least 5. 


Lemma 7. Let i > 2 and G be an i-uniform linear hypergraph. If G has girth at least 5, 
then G is {21 — 2)-union-free. 

^Note that the property that A is d-separating, prevents the dual hypergraph from having multi¬ 
hyperedges. 
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Proof. Suppose G is not { 2 i — 2)-union-free, then there exist two distinct collections of edges 
V = {Di, ..., D 2 e- 2 } and £ = {Ei, ..., E 2 e- 2 } such that 

2e-2 21-2 

U ft = U ft- 


Consider the case when there are two sets Di, D 2 that are both not members of S. If Di 
and D 2 are not disjoint, then their union has 2i — 1 elements. These elements are covered 
by the union of the EiS and each Ei contains at most one element from each of the sets 
Di and D 2 . Thus there must be an E^ that intersects both Di and D 2 in distinct vertex. 
Therefore, Ei, Di, D 2 form a triangle; a contradiction. On the other hand, if Di and D 2 are 
disjoint, then their union has 2^ elements. These elements are covered by the union of the 
EiS and each Ei contains at most one element from each of the sets Di and ^ 2 - Thus there 
must be an Ei and Ej that intersect both Di and D 2 in four distinct vertices. Therefore, 
Ei,Di, Ej, D 2 form a C 4 ; a contradiction. 

Consider the case when there is exactly one set in T) (say Di) that is not a member of £. 
Consequentially, there is a set in £ (say Ei) that is not in T). The remaining 2i — 3 members 
of V and £ are the same. The union of Di and Ei has at least 2£ — 1 elements. All the 
vertices of the union except for the intersection must be covered by the remaining 2i — 3 
edges of P. As in the previous case we get either a triangle or a C 4 (depending on whether 
Di and Ei intersect); a contradiction. □ 


Ellis and Linial [ 8 ] (using a result of Cooper, Frieze, Molloy, and Reed [1]) constructed a 
regular uniform hypergraph with girth at least 5. 


Theorem 8 (Ellis and Linial |H]). Fix integers i > 3 and k > 2. Then for every m large 
enough such that i divides m, there exists a linear k-regular i-uniform hypergraph on m 
vertices with girth at least 5. 


+ (r^/2i + 1) 

a d-separating 


We now construct a d-union-free hypergraph on at most 1 ) 

vertices and with at least n hyperedges. This will be the dual hypergraph o: 
family of sets which gives the upper bound in Theorem 01 

Set £ = [d/ 2 j + 1 and let q be the smallest integer such that < q and q is divisible 
by i. Thus 


i{n — 1 ) 
k 


<q< 


i{n — 1 ) 
k 


+ i 


' (rd/ 21 + l)(n-l) ' 
k 


+ (r^/2i + !)• 


Let Pi he a. linear fc-regular uniform hypergraph on q vertices (by Theorem | 8 ]). The number 
of hyperedges in Pi is 


kq k 


> 


l{n — 1 ) 
k 


> n — 1. 


By Lemma [7] "H is d-union-free (in fact, when d is odd "H is (d + l)-union-free). Now let 
us add the empty set (as a hyperedge) to Pi to get a hypergraph with at least n hyperedges. 
This new hypergraph is still d-union-free. 
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3 Bounds for small defective sets 

In this section we prove Theorem [5l 


3.1 Two defective elements — Proof of Theorem [5](a) 

Instead of applying the theorem of Ellis and Linial as above, we can use a version of the 
classic result of Erdos and Sachs [9] on the existence of graphs of arbitrary girth. This allows 
for a concrete bound on the threshold for n. 


Theorem 9 (Erdos and Sachs). Fix integers k > 2 and g > 4:, and let m > Ak^ he an even 
integer. Then there exists a k-regular graph on m vertices with girth at least g. 

The following proposition gives the upper bound in Theorem Ela). 

Proposition 10. Fix an integer k > 2 and let n > 2k'^. Then 

'2 {n-l)' 


q{n,2,k) < 


k 


Proof. Let q = 


2(n-l) 


> d/c®. We will construct a graph G with girth at least 5 on g 


vertices with n — 1 edges. By Lemma [7] we have that G is a 2-union-free (hyper)graph. 
Then we add the empty set (as a hyperedge) to G to get a hypergraph on q vertices with 
n hyperedges. It is easy to see that if G is 2-union-free, then adding the empty set cannot 
destroy the 2-union-free property. 

We distinguish two cases based on the parity of q. First let us assume q is even. By 
Theorem [9] there exists a fc-regular graph G on g vertices with girth at least 5. The number 
of edges in G is qk/2 >n — l. 

Now suppose g is odd. By Theorem |9] there exists a fc-regular graph G' on g -|- 1 vertices 
with girth at least 6. Let us remove an arbitrary vertex x from G'. Let X be the neighborhood 
of X. The graph G' is triangle-free, so X is an independent set. Furthermore, G' is fc-regular, 
so |X| = fc, so we can add a matching of size \_k/2\ to the vertices of X. Let the resulting 
graph be G. It is easy to see that as G' had girth at least 6, the graph G will have girth at 
least 5. The number of edges in G is at least 

k{q 1) 


-k + 


k 


k 

— 

>77,-1-h 

— 

_2_ 

“ 2 

_2_ 


> n — 1 — 


Therefore, the number of edges in G is at least n — 1. 


□ 


We now prove the lower bound on q{n,2,k). Fix k and n and let g be the minimal 
integer such that there exists a 2-separating family with query size at most k. In the dual 
hypergraph TL, let e < 1 be the number of hyperedges of size 0, let s be the number of 
hyperedges of size 1 that are contained in a hyperedge of size at least 3, let s' be the number 
of remaining hyperedges of size 1 and let t be the number of hyperedges of size at least 3. 
Therefore, the number of hyperedges of size 2 is n — e — s — s' — t. We need two simple 
lemmas relating these values. 


6 













Lemma 11. Every hyperedge of size at least 3 of EL contains at most one hyperedge of size 
1. That is s < t. 

Proof. Assume otherwise that the hyperedge h contains two hyperedges, say {a} and {6}. 
Then h U {a} = h = hU {6} contradicting the 2-union-free property of Pi. □ 

Lemma 12. //{a} and {a,b} are hyperedges of PL, then the degree of the vertex b is 1. 

Proof. Assume there is an edge incident to b, called h, different from {a, b}. Then hU {a} = 
h U {a, b} contradicting the 2-union-free property of "H. □ 

Now let us count the number of pairs {v, h) where n is a vertex and h is a hyperedge of 
H such that n G h. 

The maximum degree in PL is k. Furthermore, for each hyperedge {a} of size 1 not 
contained in a hyperedge of size at least 3, there is a vertex of degree 1 in PL. Indeed, either 
{a} is isolated and thus a is of degree 1 or {a} is in some hyperedge {a, 6} and by Lemma [12] 
we have that b is of degree 1. Thus we have at least s' vertices of degree 1. This implies that 
the number of pairs (n, h) is at most 

k{q — s') + s' = kq — {k — l)s'. 

On the other hand by counting the size of all hyperedges we get the number of pairs {v, h) 
is at least 

s -1- s' -|- 3f -|- 2(n — e — s — s' — t). 

Applying Lemma fTTl gives 

s-t-s'-|-3f-|-2(?7, —e —s —s' —f) > 2s-l-s'-|-2f-|-2(n —e —s —s' —f) = 2n —2e-l-s' > 2n —2 —s'. 
Combining the upper and lower estimates for the number of pairs (n, h) yields 

2n — 2 — s' < kq — {k — l)s'. 

Using the fact that k >2 and solving for q gives the lower bound. 

3.2 Three defective elements — Proof of Theorem [5](b) 

The upper-bound follows from Theorem 0] 

For the lower bound £x k and n and let q be the minimal integer such that there exists 
a 3-separating family A with query size at most k. Let PL be the dual hypergraph for A. 
Note that this hypergraph is not necessarily uniform. 

As in the case where d = 2, we sum the sizes of all hyperedges in PL. There is at most 1 
hyperedge of size 0 and at most q many hyperedges of size 1. First we show that there are 
not too many hyperedges of size 2 in "H. 

Lemma 13. The graph G formed by the hyperedges of size 2 in PL is a forest. 
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Proof. We show that G contains no triangle, no C 4 and no path of length 4. Such a graph 
is clearly a forest. 

Recall that a 3-separating family is also 2-separating. Let e, /, g be the edges of a triangle 
in G, then it is immediate that e U / = eU g which violates the 2-union-free property of 
Ji. Similarly, if e,f,g,h are the edges of a C 4 in G (such that e and g are disjoint), then 
eUg = fUh which violates the 2-union-free property of Pi. Finally, if e, /, g, h are the edges 
of a path of length 4 (in this order), then eU/Uh = eUgVJh which violates the 3-union-free 
property oiPi. □ 

Thus we have at most q — 1 hyperedges of size 2 in Pi. Therefore, the sum of the sizes of 
the hyperedges of Pi is at least 

q + 2{q — 1) + 3(n — 1 — 2g -|- 1) = 3n — 3g — 2 

The maximum degree in Pi is k so the above sum is at most qk. Combining these two 
estimates and solving for q yields 

3n — 2 


4 Further results 


4.1 Fixed query size 


Throughout the paper we have allowed queries to have size at most k. Katona [12] showed 
that when searching for a hxed defective set of size at most 1 there is no difference in the 
minimum number of necessary queries whether we restrict the queries size to be at most k 
or to be exactly k. Therefore it is somewhat unexpected that in the case of searching for a 
hxed defective set of size exactly d, for d > 2 , we can have different answers depending on 
whether the query size is at most k or exactly k. 

To illustrate, let us examine the simplest case when k = 2 and defective set is of size 
exactly d, for a given d > 3. We distinguish between two kind of restrictions: (i) each query 
set is of size at most 2 or (ii) each query set is of size exactly 2. By asking queries of size 1 
we can identify the defective set with n — 1 queries, so q{n, d,2) < n — 1. On the other hand, 
if a family of queries is d-separating, then it is 2-separating and we can use Theorem [5] to 
get 


q{n,d,2) > q{n,2,2) 


2(n- 1) 


Therefore, we have the following simple corollary. 


Corollary 14. Ifn>d>3, then 


q{n, d,2) = n — 1. 

However, if we only allow queries of size exactly 2 we cannot determine such a defective 
set with only n — 1 queries. 
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Proposition 15. Let q be the minimum number such that there exists a family of queries of 
size exactly 2 that can determine any defective set of size d, for a given d>3. Then q > n. 

Proof. Let Pi be the dual hypergraph for the family of queries in the statement of the 
proposition. Then "H is a 2-regular hypergraph on q vertices with n hyperedges. If all 
hyperedges of Pi are of size 2, then "H is a forest (see Lemma fl^ in Subsection l3.2l) . Therefore, 
Pi has at least n -|- 1 vertices, i.e. q> n + 1> n and we are done. 

Therefore, we may assume that Pi has at least one hyperedge of size other than 2. In Pi 
let e < 1 be the number of hyperedges of size 0, let s be the number of hyperedges of size 1, 
and let t be the number of hyperedges of size at least 3. Clearly as Pi is 2-regular it cannot 
contain an isolated hyperedge of size 1. Furthermore, Pi cannot contain two hyperedges of 
the form {a} and {a, 6} as b must have degree 1 in this case (see Lemma [T^ in Subsection l3.2p . 
Therefore, every hyperedge of size 1 is contained in a hyperedge of size at least 3, i.e. s < t. 
The sum of degrees in Pi is 2q. By counting the sizes of all edges in "H we obtain that 

2q > s + 3t + 2{n — e — s — t). 

If e = 0, then using the fact that s <t it follows that 

2q > s + 3t + 2(n — s — t) > 2s + 2t + 2{n — s — t) = 2n 

and we are done. 

Now let us suppose that e = 1, i.e. Pi contains the empty set as a hyperedge. In this 
case it is easy to see that H cannot contain any hyperedges of size 1. Indeed if {a} is a 
hyperedge, then there must be some {a, b, c} hyperedge, but then 

{a} U {a, 6, c} = 0 U {a, b, c} 

violates the d-separating property of "H. Therefore, there must be at least one hyperedge of 
size at least 3, i.e. t > 1. Thus, 

2q > 3t + 2{n — 1 — t) > 2n — 2 + t > 2n — 1. 

Therefore q > n and we are done. □ 

4.2 Adaptive search 

We call the search model adaptive if we ask the query sets in a sequence and allow that 
each query set A may depend on the answer given for previous queries. As in the previous 
sections we are particularly interested in the case when the query sets are of size at most k. 
Let y be a defective set of at most d elements. The minimum number of queries required 
determine Y among a set of size n in the adaptive model is denoted by t{n, d, k). In the case 
of d = 1 the question was solved completely by Katona [T3] . 

Theorem 16 (Katona). Let n,k be integers, such that k < ?7,/2, then 










The proof of Theorem [16] can be easily generalized for larger defective sets. 


Theorem 17. For any integers k,n > k,d > 1 it holds 

- 2 + d\l + \ogk] . 

Proof sketeh. For the upper bound we simply ask disjoint query sets of size k. Whenever 
we get a positive answer we perform a standard binary search on the k set to determine 
one of the defective elements. A second query is needed to determine if the k — 1 remaining 
elements still contain a defective element. If so we can perform another binary search. If 
not we continue with another disjoint set of size k. When there are 2k elements remaining 
we can repeatedly perform a binary search to find the remaining defective elements. 

For the lower bound suppose that we get negative answers for the hrst — 2 many 
queries. According to the information theory lower bound we need log queries to hnd 
the at most d defective elements among the remaining k + 1 elements. □ 


n' 


2 + log 


k + 1 


< t{n, d, k) < 
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